The monoclonal protein or M-spike remains a cornerstone of clinical decision-making in multiple myeloma (MM), serving as a key biomarker for diagnosis, response assessment, relapse detection, and longitudinal disease monitoring. Even as newer technologies such as mass spectrometry and next-generation MRD assays gain traction, the M-spike remains the most important surrogate of tumor burden in MM. Therefore, accurate capture and interpretation of M-spike values are critical not only for patient management but also for the development of AI models aimed at risk prediction, response assessment, and relapse forecasting.

However, M-spike values are often recorded in free-text or semi-structured formats within electronic health records, complicating their extraction and normalization for machine learning applications. Inconsistent documentation, variable units, and narrative-based reporting can introduce systematic noise or bias into model training, reducing generalizability and limiting deployment in real-world settings. To enable trustworthy and scalable AI tools in MM, precise and structured extraction of M-spike data is essential, both to ensure model accuracy and to reflect the clinical reality of routine care across diverse healthcare environments. Here, we analyzed structured M-protein coding within the HealthTree Registry to quantify gaps, characterize institutional and EHR system variability, and examine the socioeconomic distribution of coded data. Our goal was to assess how incomplete coding constrains real world data algorithm development and introduces systemic bias.

Methods: We evaluated 1,725 MM patients across 384 institutions in the HealthTree Foundation Registry. M-protein entries gold standard were defined by the presence of LOINC codes 51435-6, 35559-4, or 33358-3 in serum or plasma results. Coding was assessed both overall and within a 3-month window of MM diagnosis. Facilities were categorized as Academic/Research, Community, or Integrated Network centers based on National Cancer Database definitions. EHR for connected facilities included Epic, Cerner, and Veterans Affairs (VA). Socioeconomic status was approximated using median household income by ZIP code (US Census 2023). We compared accurate LOINC coding prevalence across EHR systems, facility types, and income strata to evaluate the operational and equity-related implications of coding gaps.

Results: Only 950 of 1,725 patients (55.1%) had at least one coded M-protein value matching our gold standard; within a 3-month window near MM diagnosis, this dropped to 20.9% (n=360). Coding prevalence varied sharply by EHR platform: Epic (47.8%), Cerner (21.2%), and VA (23.5%). Variation per facility type showed that Academic/Research centers coded 43.4% of patients, Community centers 52.9%, and Integrated Networks 51.9%. Among patients without structured coding entries, common patterns included the use of custom facility codes in place of standard LOINC identifiers, undefined codes paired with vague descriptors (e.g., “M-Protein”), or values embedded in unstructured fields such as medical history or narrative free-text notes. These documentation practices contributed to inconsistent data extraction and limited the feasibility of automated M-protein retrieval. Of the patients with zip codes available, those with coded M-protein values (n=740) showed a 7.89% higher median household incomes than those without (n=740) ($105,911 vs. $98,166, p<0.001). This difference suggests that structured data infrastructure may preferentially reflect higher-income populations.

Conclusions: Structured M-protein coding remains limited and inconsistently applied across healthcare systems, with only 55.1% following the LOINC coding standard. The majority of patients were deficient in coded entries at or near diagnosis, precisely when they are most needed for risk and outcome assessments. Heterogeneity across EHR systems and care settings challenges the scalability of automated tools across institutions. Moreover, socioeconomic disparities in data structure raise critical concerns about algorithmic bias. Without addressing these upstream gaps, decision-support models risk encoding systemic inequities and performing best in populations already advantaged by data infrastructure.

This content is only available as a PDF.
Sign in via your Institution